Check the unicode table: https://unicode-table.com/en/blocks/ There is a lot of character ranges that defines a set of characters in a language, such as: 0000—007F Basic Latin 0080—00FF Latin-1 Supplement 0100—017F Latin Extended-A 0180—024F Latin Extended-B Maybe it will prove more useful than having a default charset